Like MapReduce, tasks under big data environment are always with data-dependent constraints. The resource selection strategy in distributed storage system trends to choose the nearest data block to requestor, which ignored the server's resource load state, like CPU, disk I/O and network, etc. On the basis of the distributed storage system's cluster structure, data file division mechanism and data block storage mechanism, this paper defined the cluster-node matrix, CPU load matrix, disk I/O load matrix, network load matrix, file-division-block matrix, data block storage matrix and data block storage matrix of node status. These matrixes modeled the relationship between task and its data constraints. And the article proposed an optimal resource selection algorithm with data-dependent constraints (ORS2DC), in which the task scheduling node is responsible for base data maintenance, MapRedcue tasks and data block read tasks take different selection strategies with different resource-constraints. The experimental results show that, the proposed algorithm can choose higher quality resources for the task, improve the task completion quality while reducing the NameNode's load burden, which can reduce the probability of the single point of failure.
Through the analysis and research of reliability problems in the existing workflow scheduling algorithm, the paper proposed a reliability-based workflow strategy concerning the problems in improving the reliability of the entire workflow by sacrificing efficiency or money in some algorithms. Combining the reliability of tasks in workflow and duplication ideology, and taking full consideration of priorities among tasks, this strategy lessened failure rate in transmitting procedure and meantime shortened transmit time, so it not only enhanced overall reliability but also reduced makespan. Through the experiment and analysis, the reliability of cloud workflow in this strategy, tested by different numbers of tasks and different Communication to Computation Ratios (CCR), was proved to be better than the Heterogeneous Earliest-Finish-Time (HEFT) algorithm and its improved algorithm named SHEFTEX, including the superiority of the proposed algorithm over the HEFT in the completion time.